Search CORE

13 research outputs found

How to Price Shared Optimizations in the Cloud

Author: Balazinska Magdalena
Suciu Dan
Upadhyaya Prasang
Publication venue
Publication date: 01/01/2011
Field of study

Data-management-as-a-service systems are increasingly being used in collaborative settings, where multiple users access common datasets. Cloud providers have the choice to implement various optimizations, such as indexing or materialized views, to accelerate queries over these datasets. Each optimization carries a cost and may benefit multiple users. This creates a major challenge: how to select which optimizations to perform and how to share their cost among users. The problem is especially challenging when users are selfish and will only report their true values for different optimizations if doing so maximizes their utility. In this paper, we present a new approach for selecting and pricing shared optimizations by using Mechanism Design. We first show how to apply the Shapley Value Mechanism to the simple case of selecting and pricing additive optimizations, assuming an offline game where all users access the service for the same time-period. Second, we extend the approach to online scenarios where users come and go. Finally, we consider the case of substitutive optimizations. We show analytically that our mechanisms induce truth- fulness and recover the optimization costs. We also show experimentally that our mechanisms yield higher utility than the state-of-the-art approach based on regret accumulation.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

ConfErr: A Tool for Assessing Resilience to Human Configuration Errors

Author: Candea George
Keller Lorenzo
Upadhyaya Prasang
Publication venue
Publication date: 01/01/2008
Field of study

We present ConfErr, a tool for testing and quantifying the resilience of software systems to human-induced configuration errors. ConfErr uses human error models rooted in psychology and linguistics to generate realistic configuration mistakes; it then injects these mistakes and measures their effects, producing a resilience profile of the system under test. The resilience profile, capturing succinctly how sensitive the target software is to different classes of configuration errors, can be used for improving the software or to compare systems to each other. ConfErr is highly portable, because all mutations are performed on abstract representations of the configuration files. Using ConfErr, we found several serious flaws in the MySQL and Postgres databases, Apache web server, and BIND and djbdns name servers; we were also able to directly compare the resilience of functionally-equivalent systems, such as MySQL and Postgres

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Crossref

Managing Premium Data

Author: Upadhyaya Prasang
Publication venue
Publication date: 01/01/2015
Field of study

Thesis (Ph.D.)--University of Washington, 2015Data is transforming science, business, and governance by making decisions increasingly data-driven and by enabling data-driven applications. The data used in these contexts usually has significant economic or social value. Frequently, data is purchased from a provider where the price is linked to how the data will be used and the allowed usage is typically detailed in a license agreement. Data processing, too, is moving to public clouds where users must pay for access to cloud resources, which are frequently shared by multiple users, especially when users analyze a common dataset. Current solutions to manage the economic value of data (prices and licenses) rely on expensive support from economists, auditors and lawyers, thus, reducing the net value of data. Similarly, how to price shared cloud resources is poorly understood and when pricing ignores the shared nature of use, the cloud resources are significantly underutilized and users cannot realize the full value of their data. In this thesis, we develop novel, principled and usable tools to manage data licenses and the pricing issues for data and cloud-based data processing. We first present DataLawyer, a system to specify and enforce data use policies on re- lational databases. It includes an SQL-based formalism to precisely define policies, and novel algorithms, to automatically and efficiently evaluate the policies. Experiments on a real dataset from the health-care domain demonstrate overhead reductions of up to 330× compared to a direct implementation of such a system on existing databases. Next, we present a new approach for selecting and pricing shared optimizations on the cloud by using Mechanism Design. We develop new mechanisms, where users bid for opti- mizations, to select and price additive and substitutive optimizations, and for the general setting where the users and their bids can change over time. We show analytically that our mechanisms incentivize truthful bidding and ensure that the cloud never loses money. We show experimentally that our mechanisms yield higher utility than the state-of-the-art approach based on regret accumulation. Lastly, we present improvements to data APIs. APIs are a common way to buy data. But users can significantly overpay when they makes multiple API calls and end up purchasing the same data item more than once. We provide a novel, lightweight and fast method to support pricing where a buyer is only charged once for each purchased tuple, even with multiple API calls. To enable this, we present a pricing framework where buyers can refund repeat purchases of data. We provide the protocols for refunds and develop optimizations to reduce the overhead of exercising refunds. Experiments show that data costs are significantly reduced (10x to 99x) for comparatively modest increases (2x to 5x) in query runtimes

CiteSeerX

DSpace at The University of Washington

A Latency and Fault-Tolerance Optimizer for Online Parallel Query Plans

Author: Magdalena Balazinska
Prasang Upadhyaya
Yongchul Kwon
Publication venue
Publication date: 01/01/2011
Field of study

We address the problem of making online, parallel query plans fault-tolerant: i.e., provideintra-query fault-tolerance without blocking. We develop an approach that not only achieves this goal but does so through the use of different fault-tolerance techniques at different operators within a query plan. Enabling each operator to use a different faulttolerance strategy leads to a space of fault-tolerance plans amenable to cost-based optimization. We develop FTOpt, a cost-based fault-tolerance optimizer that automatically selects the best strategy for each operator in a query plan in a manner that minimizes the expected processing time with failures for the entire query. We implement our approach in a prototype parallel query-processing engine. Our experiments demonstrate that (1) there is no single best fault-tolerance strategy for all query plans, (2) often hybrid strategies that mix-and-match recovery techniques outperform any uniform strategy, and (3) our optimizer correctly identifies winning fault-tolerance configurations. Categories and Subject Descriptors C.4 [Performance of Systems]: Fault tolerance, modeling techniques; H.2.4 [Database Management]: Systems

CiteSeerX

Crossref

Query-Based Data Pricing

Author: Bill Howe
Dan Suciu
Magdalena Balazinska
Paraschos Koutris
Prasang Upadhyaya
Publication venue
Publication date: 01/01/2012
Field of study

Data is increasingly being bought and sold online, and Webbased marketplace services have emerged to facilitate these activities. However, current mechanisms for pricing data are very simple: buyers can choose only from a set of explicit views, each with a specific price. In this paper, we propose a framework for pricing data on the Internet that, given the price of a few views, allows the price of any query to be derived automatically. We call this capability “querybased pricing. ” We first identify two important properties that the pricing function must satisfy, called arbitragefree and discount-free. Then, we prove that there exists a unique function that satisfies these properties and extends the seller’s explicit prices to all queries. When both the views and the query are Unions of Conjunctive Queries, the complexity of computing the price is high. To ensure tractability, we restrict the explicit prices to be defined only on selection views (which is the common practice today). We give an algorithm with polynomial time data complexity for computing the price of any chain query by reducing the problem to network flow. Furthermore, we completely characterize the class of Conjunctive Queries without selfjoins that have PTIME data complexity (this class is slightly larger than chain queries), and prove that pricing all other queries is NP-complete, thus establishing a dichotomy on the complexity of the pricing problem when all views are selection queries

CiteSeerX

Crossref

Toward Practical Query Pricing with QueryMarket

Author: Bill Howe
Dan Suciu
Magdalena Balazinska
Paraschos Koutris
Prasang Upadhyaya
Publication venue
Publication date: 01/01/2013
Field of study

We develop a new pricing system, QueryMarket, for flexible query pricing in a data market based on an earlier theoretical framework (Koutris et al., PODS 2012). To build such a system, we show how to use an Integer Linear Programming formulation of the pricing problem for a large class of queries, even when pricing is computationally hard. Further, we leverage query history to avoid double charging when queries purchased over time have overlapping information, or when the database is updated. We then present a technique that fairly shares revenue when multiple sellers are involved. Finally, we implement our approach in a prototype and evaluate its performance on several query workloads

CiteSeerX

Crossref

The Power of Data Use Management in Action

Author: Bill Howe
Dan Suciu
Magdalena Balazinska
Nick Anderson
Prasang Upadhyaya
Raghav Kaushik
Ravi Ramamurthy
Publication venue
Publication date: 01/01/2013
Field of study

In this demonstration, we show-case a database management system extended with a new type of component that we call a Data Use Manager (DUM). The DUM enables DBAs to attach policies to data loaded into the DBMS. It then monitors how users query the data, flags potential policy violations, recommends possible fixes, and supports offline analysis of user activities related to data policies. The demonstration uses real healthcare data

CiteSeerX

Crossref